

# Virtually Solving Debug Challenges of the MPSoC Era

MPSoC 2008, Aachen Kevin Smart, Senior Manager R&D Frank Schirrmeister, Product Marketing Director Filip Thoen, Solution Architect Synopsys, Inc.

# **Executive Summary**

- MPSoCs will have a daunting number of processors
- Software will be challenging, especially migration and debug
- Some debug examples
- Virtual platforms offer the solution
- SystemC TLM-2.0
- Modeling Style Examples
- Demo
- Wrap & Questions



# **ITRS Forecast**



Source: International Technology Roadmap for Semiconductors 2007



# **Consumer Portable: Processor Driven**



Source: ITRS 2007



# **Market Dynamics**

Synopsys SNUG Data confirm the software trend!



What percentage of your total project effort is spent on software development (vs. hardware development) during design? 2008 N = 404; Margin of error = +/- 5% Source: Synopsys San Jose SNUG Survey



# **Some Challenges**



- How to distribute software across cores?
  - Lots of legacy
  - Need to be able to migrate existing serial code
- Debug
  - Races, deadlocks, memory corruption
- Performance
  - Stalls etc.
- Need lots of insight into the actual platform and the execution of software on it



# SOME DEBUG CHALLENGES



©Synopsys 2008

# **Data Races**

Leading to unpredictable results

- Two processes run in parallel accessing the same set of data
- While process A is accessing data it is interrupted and process B modifies the data
- Consistency of data read by process A unpredictable
- For debugging, users need good insight into the platform and its memory





# Deadlocks

Bringing the MP application to a halt ...

- Two processes run in parallel, their data accesses are protected by semaphores
- Both processes individually lock the semaphore which the other needs next
- Deadlock occurs when the order is out of sync
- For debugging, users need good insight into the platform and its memory





# Stalls

#### Locks causing performance issues

- Not really an actual bug but has heavy impact on performance
- Happens when several parallel processes access the same locked area and have to wait for their turn a lot



- For debugging, users need good insight into the platform and its memory
- Also, timing annotations are useful to quantify performance impact



# SOLUTION: VIRTUAL PLATFORMS



#### **Current approaches will fail!** A Chip Design Project P&L Without Virtual Platforms



Source: Derived from IBS Data, 130nm, Wireless Application



#### Virtual Platforms offer the solution Like Hardware – Only Much, Much Sooner!



Source: Derived from IBS Data, 130nm, Wireless Application



### **Virtual Platform Impact**

Gain full insight into design, get to market early and increase profit





#### **Requirements / Technology Vectors** Using TL Models for Early Software Development

- Completeness Model of complete SYSTEM
  - Required as "system software" requires all these aspects
  - Coverage required for: (1) board-level, (2) SOC models,
    (3) system I/O, (4) system/device user interface
- Extreme Simulation Performance > 10-50 MIPS for full platform
  - Higher performance allows bigger software stacks to be run
  - Required to support fast "edit-compile-debug" development cycle
- Binary compatibility
  - Run actual target binaries, requiring no changes between simulator & target
  - Avoids porting effort & reduces risk & enables SW optimizations
- Visibility & controllability
  - Leverage simulator unique features to improve visibility into "black-box" SoCs
  - Opportunity for dramatic SW development productivity improvement (2-5x)
- Virtual I/O Emulates System I/O
  - Allows SW developer to test end user scenarios (e.g. contact sync)
  - Acts as a real SW development target real world connectivity
- Supports standard SW development interfaces & tools
  - No changes in SW development cycle (no new tools, methodologies, ...)
  - No new debug infrastructure needs to be developed





# **SYSTEMC TLM-2.0**



#### The Impact of SystemC TLM-2.0 Enabling Interoperability and Scalability

- Previously proprietary (backdoor) APIs & new additions have now been standardized:
- (DMI) Direct Memory Interface
  - Direct backdoor access into memory
  - Allows un-inhibited ISS execution
- LT (Loosely Timed) modeling
  - Declare but don't execute timing
  - Allows speed/accuracy trade-offs
- Temporal Decoupling
  - Only synchronize when necessary
  - Allows multicore speedup





# The Impact of SystemC TLM-2.0

Synopsys has the complete portfolio!

- Innovator
  - PV (LT) modeling
  - PV+T (AT) "timing annotation"
- DesignWare® IP
  - Implementation, Verification and System-Level IP
- Modeling Services
- VCS
  - For hard CA requirements running in software
- Synplicity HAPS





# **MODELING STYLE EXAMPLES**





- Only sufficient timing detail to boot O/S and run multi-core systems
- Processes can run ahead of simulation time (temporal decoupling), for faster performance
- Uses direct memory interface (DMI)- no contention
- Quantum is user configurable. Data races, deadlocks and stalls may not be observed if Quantum is too long



# Approximately-timed (AT) 0 10 20 30 40 50 Process 1 Annotated delays Process 2 Process 3

- Sufficient for architectural exploration
- Processes run in lock-step with simulation time

• A model may switch between the LT and AT coding style during simulation. Run rapidly through the reset and boot sequence at the LT level, then switch to AT modeling for more detailed analysis once the simulation has reached an interesting stage



#### **SystemC LT System Model** Fast Transaction-level Models - Technology Details



Predictable Success

### SystemC AT System Model Features & Capabilities



- Hardware / Timing Features Modeled
  - Buses
    - Contention & Arbitration
    - Pipelining & concurrency
    - Burst-at-once (typical), or individual phases
  - Slaves
    - Access timing
    - Processing delay
  - CPU

. . .

- Cache(s)
- Instruction cycle timing: CPI, or
- Memory controllers
  - Static delay ('wait states'), or
  - Pages / banks
  - Dynamic (transaction re-ordering)

#### Performance Statistics Generated

- CPU: cycle counts, cache hit/miss rates, average cache line age, ..
- Bus: effective bandwidth, # transactions, ..
- Memory controllers: #page hits/misses, #reschedules, queue size
- System-specific statistics



# **Required Logging & Visualization**



#### **Predefined Views**

- Bus transactions / traffic view
- Bus transaction statistics
- Timing display of system (power) events
- Memory
  - Memory utilization
  - (Effective) Memory bandwidth
  - Memory heat map

- MMU: TLB hits/misses Cache & memory heat map
- Cache statistics
  - Cache hit/miss
  - Cache eviction rate
  - Cache heat map
- IRQ latency, together with laxity window
- Bus bandwidth history
- Instruction traces

# A TLM-2.0 Virtual Platform

- 1. Platform in SystemC using TLM-2.0 LT modeling
- 2. Runs at real-time speed
- 3. Interact live with virtual platform design
- 4. Connect to environment via Virtual I/O
- 5. Run real time video/audio
- 6. Start/Stop/Break for debug in multicore scenarios





#### **Texas Instruments OMAP 2420** Multicore Debug Example





# SUMMARY



©Synopsys 2008

#### Virtual Platforms in the Design Flow Leading from functional specification to RTL Implementation



**RTL to GDSII Implementation Flow (Discovery VCS to Galaxy)** 



# **Executive Summary**

- MPSoCs will have a daunting number of processors
- Software will be challenging, especially migration and debug
- Some debug examples
- Virtual platforms offer the solution
- SystemC TLM-2.0
- Modeling Style Examples
- Demo
- Wrap & Questions

